Example Data Analysis with Pandas

Updated: July 11, 2014

This notebook shows some sample data analysis with Pandas.

First, let's import Pandas and set reasonable display options for plotting:

In [ ]:
import pandas as pd
pd.options.display.mpl_style = 'default'

Loading a Data Set

Now, let's get a data set. This is one essentially selected at random: a data set of home health care providers in the United States, available as a CSV file.

wget is a great command line tool for fetching data. The --timestamping option is especially helpful: it only downloads the file if it's missing or it changes. This prevents you from re-downloading data every time you rerun the notebook.

In [3]:
!wget --timestamping https://data.medicare.gov/api/views/6jpm-sxkc/rows.csv

--2014-07-10 13:46:48--  https://data.medicare.gov/api/views/6jpm-sxkc/rows.csv
Resolving data.medicare.gov (data.medicare.gov)...
Connecting to data.medicare.gov (data.medicare.gov)||:443... connected.
HTTP request sent, awaiting response... 400 Bad Request
2014-07-10 13:46:48 ERROR 400: Bad Request.

The Pandas read_csv() function loads CSV data into a DataFrame. This function will also read from a URL, but by using wget and a local file, we don't hit the source server each time the cell/notebook is run.

In [5]:
data = pd.read_csv('rows.csv')

When studying a data set for the first time, it's sometimes easier to look at a large number of columns by cutting out a few rows and transposing things:

In [6]:

0 1 2
State AL AL AL
CMS Certification Number (CCN)* 17000 17008 17009
Zip 36104 35020 35216
Phone 3342065341 2059169500 2058242680
Type of Ownership State/County State/County Proprietary
Offers Nursing Care Services True True True
Offers Physical Therapy Services True True True
Offers Occupational Therapy Services True True True
Offers Speech Pathology Services True True True
Offers Medical Social Services True True True
Offers Home Health Aide Services True True True
Date Certified 07/01/1966 10/01/1972 01/18/1973
How often the home health team began their patients' care in a timely manner NaN 94 94
Footnote for how often the home health team began their patients' care in a timely manner This measure currently does not have data or h... NaN NaN
How often the home health team taught patients (or their family caregivers) about their drugs NaN 97 96
Footnote for how often the home health team taught patients (or their family caregivers) about their drugs This measure currently does not have data or h... NaN NaN
How often the home health team checked patients' risk of falling NaN 95 99
Footnote for how often the home health team checked patients' risk of falling This measure currently does not have data or h... NaN NaN
How often the home health team checked patients for depression NaN 100 100
Footnote for how often the home health team checked patients for depression This measure currently does not have data or h... NaN NaN
How often the home health team determined whether patients received a flu shot for the current flu season NaN 51 75
Footnote for how often the home health team determined whether patients received a flu shot for the current flu season This measure currently does not have data or h... NaN NaN
How often the home health team determined whether their patients received a pneumococcal vaccine (pneumonia shot) NaN 63 78
Footnote as how often the home health team determined whether their patients received a pneumococcal vaccine (pneumonia shot) This measure currently does not have data or h... NaN NaN
With diabetes, how often the home health team got doctor's orders, gave foot care, and taught patients about foot care NaN 100 97
Footnote for how often the home health team got doctor's orders, gave foot care, and taught patients about foot care This measure currently does not have data or h... NaN NaN
How often the home health team checked patients for pain NaN 99 100
Footnote for how often the home health team checked patients for pain This measure currently does not have data or h... NaN NaN
How often the home health team treated their patients' pain NaN 100 99
Footnote for often the home health team treated their patients' pain This measure currently does not have data or h... NaN NaN
How often the home health team treated heart failure (weakening of the heart) patients' symptoms NaN 100 97
Footnote for how often the home health team treated heart failure (weakening of the heart) patients' symptoms This measure currently does not have data or h... NaN NaN
How often the home health team took doctor-ordered action to prevent pressure sores (bed sores) NaN 99 98
Footnote for how often the home health team took doctor-ordered action to prevent pressure sores (bed sores) This measure currently does not have data or h... NaN NaN
How often the home health team included treatments to prevent pressure sores (bed sores) in the plan of care NaN 100 100
Footnote for how often the home health team included treatments to prevent pressure sores (bed sores) in the plan of care This measure currently does not have data or h... NaN NaN
How often the home health team checked patients for the risk of developing pressure sores (bed sores) NaN 100 100
Footnote for how often the home health team checked patients for the risk of developing pressure sores (bed sores) This measure currently does not have data or h... NaN NaN
How often patients got better at walking or moving around NaN 66 69
Footnote for how often patients got better at walking or moving around This measure currently does not have data or h... NaN NaN
How often patients got better at getting in and out of bed NaN 65 62
Footnote for how often patients got better at getting in and out of bed This measure currently does not have data or h... NaN NaN
How often patients got better at bathing NaN 63 76
Footnote for how often patients got better at bathing This measure currently does not have data or h... NaN NaN
How often patients had less pain when moving around NaN 69 69
Footnote for how often patients had less pain when moving around This measure currently does not have data or h... NaN NaN
How often patients' breathing improved NaN 53 72
Footnote for how often patients' breathing improved This measure currently does not have data or h... NaN NaN
How often patients' wounds improved or healed after an operation NaN 88 91
Footnote for how often patients' wounds improved or healed after an operation This measure currently does not have data or h... NaN NaN
How often patients got better at taking their drugs correctly by mouth NaN 51 61
Footnote for how often patients got better at taking their drugs correctly by mouth This measure currently does not have data or h... NaN NaN
How often patients receiving home health care needed urgent, unplanned care in the ER without being admitted NaN 10 10
Footnote for how often patients receiving home health care needed urgent, unplanned care in the ER without being admitted This measure currently does not have data or h... NaN NaN
How often home health patients had to be admitted to the hospital NaN 20 18
Footnote for how often home health patients had to be admitted to the hospital This measure currently does not have data or h... NaN NaN

60 rows × 3 columns

We still have a little bit of cleanup. For example, the dates are strings and we need to convert them to datetime objects:

In [87]:
data['Date Certified'] = pd.to_datetime(data['Date Certified'])
data['Date Certified'].head()

0   1966-07-01
1   1972-10-01
2   1973-01-18
3   1975-07-24
4   1975-09-04
Name: Date Certified, dtype: datetime64[ns]

In addition, it's helpful to extract the year so we can plot, group & graph by that variable:

In [93]:
data['Year Certified'] = pd.DatetimeIndex(data['Date Certified']).year
data['Year Certified'].head()

0    1966
1    1972
2    1973
3    1975
4    1975
Name: Year Certified, dtype: int32

Exploring the Data with Plotting

See the Pandas documentation for plotting.

How does ownership breakdown? Mostly private, not surprising:

In [8]:
data['Type of Ownership'].value_counts().plot(kind='bar');

Using the value_counts() method, we can count providers by state:

In [135]:

That's interesting. Why so many providers in Texas?

We can also look at how many providers were certified each year:

In [43]:
data['Year Certified'].hist(bins=40, figsize=(10,5));

<matplotlib.axes.AxesSubplot at 0x7f63143f3650>

Or, we can look at things cumulatively:

In [58]:
per_year = pd.value_counts(data['Year Certified']).sort_index()
(per_year * 100.0 / len(data)).cumsum().plot(figsize=(15,5));

Joining Datasets

You can sometimes find intersting insights by joining different datasets. Let's download some US Census population data for states:

In [136]:
!wget --timestamping http://www.census.gov/popest/data/national/totals/2013/files/NST-EST2013-popchg2010_2013.csv

--2014-07-10 15:56:19--  http://www.census.gov/popest/data/national/totals/2013/files/NST-EST2013-popchg2010_2013.csv
Resolving www.census.gov (www.census.gov)..., 2600:807:320:202:8500::e14, 2600:807:320:202:a000::e14
Connecting to www.census.gov (www.census.gov)||:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10026 (9.8K) [text/plain]
Saving to: 'NST-EST2013-popchg2010_2013.csv'

100%[======================================>] 10,026      --.-K/s   in 0.001s  

2014-07-10 15:56:19 (16.5 MB/s) - 'NST-EST2013-popchg2010_2013.csv' saved [10026/10026]

In [175]:
census = pd.read_csv('NST-EST2013-popchg2010_2013.csv')

0 1 2
SUMLEV 10 20 20
REGION 0 1 2
STATE 0 0 0
NAME United States Northeast Region Midwest Region
ESTIMATESBASE2010 308747716 55317261 66927549
POPESTIMATE2010 309326295 55376322 66976321
POPESTIMATE2011 311582564 55598499 67146663
POPESTIMATE2012 313873685 55771792 67321425
POPESTIMATE2013 316128839 55943073 67547890
NPOPCHG_2010 578579 59061 48772
NPOPCHG_2011 2256269 222177 170342
NPOPCHG_2012 2291121 173293 174762
NPOPCHG_2013 2255154 171281 226465
PPOPCHG_2010 0.1873954 0.1067678 0.07287283
PPOPCHG_2011 0.7294139 0.401213 0.2543317
PPOPCHG_2012 0.7353175 0.3116865 0.2602691
PPOPCHG_2013 0.7184909 0.3071104 0.3363937

31 rows × 3 columns

We need to do a little mapping and indexing work, so we can join up on the state abbreviation (i.e. "WV"):

In [176]:
import usstates
state_abbrev = pd.DataFrame(usstates.states.keys(), index=usstates.states.values(), columns = ['ABBREV'])
census = census.join(state_abbrev, on='NAME', how='inner')
census = census.set_index("ABBREV")

Now we have a Dataframe that's indexed by state abbreviation:

In [177]:

AL 40 3 6 1 Alabama 4779758 4785570 4801627 4817528 4833722 5812 16057 15901 16194 0.121596 0.335530 0.331159 0.336148 23 23 ...
AK 40 4 9 2 Alaska 710231 713868 723375 730307 735132 3637 9507 6932 4825 0.512087 1.331759 0.958286 0.660681 47 47 ...
AZ 40 4 8 4 Arizona 6392015 6408790 6468796 6551149 6626624 16775 60006 82353 75475 0.262437 0.936308 1.273081 1.152088 16 16 ...
AR 40 3 7 5 Arkansas 2915916 2922280 2938506 2949828 2959373 6364 16226 11322 9545 0.218250 0.555251 0.385298 0.323578 32 32 ...
CA 40 4 9 6 California 37253959 37333601 37668681 37999878 38332521 79642 335080 331197 332643 0.213781 0.897529 0.879237 0.875379 1 1 ...

5 rows × 31 columns

We can join that with our dataset based on the 'State' column:

In [230]:
summary = census.join(pd.Series(data['State'].value_counts(), name = "per_state"))

Computing the number of healthcare providers, per capita by state:

In [238]:
summary["per_capita"] = summary["POPESTIMATE2013"] / summary["per_state"]
t = summary.sort("per_capita", ascending=True)[['NAME', 'per_capita']]

<matplotlib.axes.AxesSubplot at 0x7f6318b8f1d0>

What's With Texas?

Let's look at the top zip codes:

In [21]:
top_zips = data['Zip'].value_counts()

77036    119
48075     77
43229     58
75243     51
77477     51
77074     48
77478     43
33186     40
33155     37
43231     34
dtype: int64

Wow! Zip code 77036 (in Houston, TX) has a lot of providers. Almost 1% of the providers in the entire United States are in that zip code. (Keep in mind there's nothing in this data set that describes the size of the provider, so it's entirely possible there are a large number of small providers).

In [97]:
top_zips[77036] * 100.0 / len(data)


When were all of the Texas providers certified?

In [99]:
tx = data[data['State'] == 'TX']
tx['Year Certified'].hist(bins=40, figsize=(10,5));

<matplotlib.axes.AxesSubplot at 0x7f631babff50>

Lots of providers certified in 2006:

In [207]:
tx_2006 = len(tx[tx['Year Certified'] == 2006])
tx_2006, tx_2006 * 100.0 / len(data)

(312, 2.506829503454925)

Let's drill in that Houston ZIP code:

In [240]:
zip_77036 = data[data['Zip'] == 77036]

State CMS Certification Number (CCN)* Provider Name Address City Zip Phone Type of Ownership Offers Nursing Care Services Offers Physical Therapy Services Offers Occupational Therapy Services Offers Speech Pathology Services Offers Medical Social Services Offers Home Health Aide Services Date Certified How often the home health team began their patients' care in a timely manner Footnote for how often the home health team began their patients' care in a timely manner How often the home health team taught patients (or their family caregivers) about their drugs Footnote for how often the home health team taught patients (or their family caregivers) about their drugs How often the home health team checked patients' risk of falling
8861 TX 453102 CARITAS HEALTH CARE LLC 9788 CLAREWOOD DRIVE SUITE 208 HOUSTON 77036 7135540800 Proprietary True True True True True True 2004-05-04 96 NaN 93 NaN 100 ...
8889 TX 453132 STATES HEALTH INC 6666 HARWIN DRIVE SUITE 540 HOUSTON 77036 7135326800 Proprietary True True True True True True 2004-07-30 98 NaN 77 NaN 99 ...
8909 TX 453155 SEGNIK HEALTHCARE SERVICES 7001 CORPORATE DRIVE 302 HOUSTON 77036 7134848699 Proprietary True True True True True True 2004-09-08 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...
8913 TX 453160 RESOURCE CARE CORPORATION 7211 REGENCY SQUARE 116 HOUSTON 77036 7139729090 Proprietary True True True True True True 2004-09-16 92 NaN 81 NaN 90 ...
9059 TX 457816 TRINITY HOME HEALTH SERVICES 8700 COMMERCE PARK SUITE 239 HOUSTON 77036 7137746363 Proprietary True True True True True True 2005-05-05 93 NaN 56 NaN 85 ...

5 rows × 61 columns

That's a lot of providers on the same streets, so let's extract just the street name into a new column:

In [ ]:
zip_77036['Street Name'] = [addr.split()[1] for addr in zip_77036['Address']]

Let's look at streets and have more than two providers:

In [113]:
name_counts = zip_77036['Street Name'].value_counts()
name_counts[name_counts > 2].plot(kind='bar',figsize=(20,5));

Zeroing in on the two most common streets, there are 66 providers:

In [116]:
mask = zip_77036['Street Name'] == 'HARWIN'
mask |= zip_77036['Street Name'] == 'BISSONNET'
cluster = zip_77036[mask]


In [117]:

State CMS Certification Number (CCN)* Provider Name Address City Zip Phone Type of Ownership Offers Nursing Care Services Offers Physical Therapy Services Offers Occupational Therapy Services Offers Speech Pathology Services Offers Medical Social Services Offers Home Health Aide Services Date Certified How often the home health team began their patients' care in a timely manner Footnote for how often the home health team began their patients' care in a timely manner How often the home health team taught patients (or their family caregivers) about their drugs Footnote for how often the home health team taught patients (or their family caregivers) about their drugs How often the home health team checked patients' risk of falling
8889 TX 453132 STATES HEALTH INC 6666 HARWIN DRIVE SUITE 540 HOUSTON 77036 7135326800 Proprietary True True True True True True 2004-07-30 98 NaN 77 NaN 99 ...
9071 TX 457829 ESSENCE HEALTH CARE INC 10101 HARWIN DRIVE SUITE 190 HOUSTON 77036 7137780523 Proprietary True True True True True False 2005-05-25 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...
9090 TX 457853 UPHILL HOME HEALTH INC 7447 HARWIN 205 HOUSTON 77036 7139531187 Proprietary True True True True True True 2005-06-28 91 NaN 30 NaN 88 ...
9093 TX 457856 ALPHA HALOBET HEALTH CARE SERVICES INC 9898 BISSONNET SUITE 320 HOUSTON 77036 7137789199 Proprietary True True True True True True 2005-06-23 87 NaN 97 NaN 100 ...
9168 TX 457954 ABL HOMEHEALTH SERVICES INC 9888 BISSONNET STREET SUITE 135 HOUSTON 77036 2814988666 Proprietary True True True True True False 2005-11-30 97 NaN 100 NaN 100 ...
9236 TX 458141 TRADITIONS HEALTH CARE OF HOUSTON GALVESTON LLC 10333 HARWIN DRIVE SUITE 470 HOUSTON 77036 7132661062 Proprietary True True True True True True 1994-08-29 81 NaN 88 NaN 99 ...
9257 TX 458287 BELOVED HOME HEALTH SERVICES INC 9888 BISSONNET SUITE 430 HOUSTON 77036 7137769333 Proprietary True True True True True True 1995-01-10 57 NaN 98 NaN 98 ...
9305 TX 459006 VITAL AMBULATORY HEALTHCARE INC 6666 HARWIN DRIVE SUITE 350 HOUSTON 77036 7132706995 Private True True True True True True 1996-11-26 72 NaN 99 NaN 100 ...
9306 TX 459008 PRESTIGE HEALTH SERVICES, INC. 9898 BISSONNET STREET, SUITE #594 HOUSTON 77036 7137741195 Proprietary True True True True True True 1996-11-26 49 NaN 100 NaN 100 ...
9320 TX 459068 GC HEALTH SERVICES INC 9898 BISSONNET SUITE 426 HOUSTON 77036 7137763309 Proprietary True True True True True True 1997-01-16 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...
9331 TX 459164 I CARE HOME HEALTH 10039 BISSONNET SUITE 228 HOUSTON 77036 7137797992 Private True True True True True True 1997-03-04 83 NaN 86 NaN 100 ...
9336 TX 459178 HUMANE HEALTH CARE INC 7457 HARWIN DRIVE SUITE 185 HOUSTON 77036 7137717277 Proprietary True False False False False True 1996-12-31 NaN This measure currently does not have data or h... NaN This measure currently does not have data or h... NaN ...
9355 TX 459292 TAWL HEALTH CARE INC 9898 BISSONNET SUITE 600 HOUSTON 77036 7137779171 Proprietary True True True True True True 1997-05-29 94 NaN 99 NaN 98 ...
10250 TX 673130 TREASURE HEALTHCARE, INC. 9898 BISSONNET STREET, SUITE #260 HOUSTON 77036 7139817629 Proprietary True True True True True True 2005-02-17 89 NaN 100 NaN 100 ...
10257 TX 673138 RESOURCE HEALTH CARE INC 7447 HARWIN SUITE 216 HOUSTON 77036 7132708880 Proprietary True True True True True True 2005-03-02 99 NaN 88 NaN 100 ...
10483 TX 677826 JALSTAD HEALTHCARE SERVICES 10101 HARWIN DRIVE #367 HOUSTON 77036 7132712967 Proprietary True True True True True True 2006-03-07 73 NaN 64 NaN 100 ...
10499 TX 677844 TEXAS TENDER CARE HOME INC 7457 HARWIN 325 HOUSTON 77036 7137828035 Proprietary True True True True True True 2006-01-18 97 NaN 100 NaN 100 ...
10521 TX 677870 ALL MODERN HEALTHCARE INC 10101 HARWIN DR SUITE 286 HOUSTON 77036 7136581000 Proprietary True True True True True True 2006-03-10 96 NaN 100 NaN 98 ...
10538 TX 677891 SIMPLEX HEALTH AND ALLIED SERVICES INC 6666 HARWIN DRIVE SUITE #668 HOUSTON 77036 7133347266 Proprietary True True True True True True 2006-04-03 95 NaN 100 NaN 98 ...
10548 TX 677901 METRO HEALTH SERVICE 9894 BISSONNET SUITE 365 HOUSTON 77036 7137779600 Proprietary True True True True True False 2006-04-12 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...
10573 TX 677930 WESTNET HEALTHCARE PLUS INC 7457 HARWIN SUITE 294 HOUSTON 77036 7138279865 Proprietary True True True True True True 2006-04-27 100 NaN 95 NaN 100 ...
10584 TX 677942 ROYAL STAR HEALTHCARE INC 7457 HARWIN DRIVE SUITE 252 HOUSTON 77036 7135897019 Proprietary True True True True True True 2006-05-23 89 NaN 97 NaN 100 ...
10610 TX 677970 ELITTE HEALTHCARE AND SERVICE 9888 BISSONNET SUITE 100-F HOUSTON 77036 7137769399 Proprietary True True True True False False 2006-06-22 90 NaN 89 NaN 100 ...
10612 TX 677972 EMEX HOME HEALTH AGENCY INC 7457 HARWIN DRIVE SUITE 302 HOUSTON 77036 7133346949 Proprietary True True True False True True 2006-06-29 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...
10625 TX 677986 EAGLESYS HEALTH CARE SERVICES INC 9894 BISSONNET ST # 912 HOUSTON 77036 7139956400 Proprietary True True True True True False 2006-07-10 NaN This measure currently does not have data or h... NaN This measure currently does not have data or h... NaN ...
10634 TX 677995 UNIFIED MEDICAL GROUP INC 10101 HARWIN DRIVE SUITE 260 HOUSTON 77036 7137721300 Proprietary True True True True True True 2006-06-21 77 NaN 100 NaN 96 ...
10785 TX 679068 EXCELS HEALTH CARE SERVICES INC 9898 BISSONNET SUITE 388 HOUSTON 77036 7137718826 Proprietary True True True True True True 2001-09-06 97 NaN 98 NaN 100 ...
10802 TX 679085 CASSEL HEALTH SERVICES 10333 HARWIN DR STE 575 HOUSTON 77036 7139889443 Proprietary True True True True True True 2001-11-19 96 NaN 87 NaN 97 ...
10808 TX 679093 MAXCARE HOME HEALTH SERVICES INC 10039 BISSONNET SUITE 338 HOUSTON 77036 7137770888 Proprietary True True True True True True 2001-12-31 95 NaN 33 NaN 98 ...
10855 TX 679149 TEXAS QUALITY HOME HEALTH INC 9888 BISSONNET STREET SUITE 570 HOUSTON 77036 7137781105 Proprietary True True True True True False 2002-07-24 84 NaN 95 NaN 100 ...
10907 TX 679212 EVANGEL HEALTH SERVICES INC 7111 HARWIN DRIVE SUITE 277 HOUSTON 77036 7134846900 Proprietary True True True True True True 2003-01-23 91 NaN 57 NaN 100 ...
10913 TX 679219 CHARLTON HOME HEALTH INC 9888 BISSONNET SUITE 268 HOUSTON 77036 7132712533 Proprietary True True True True True False 2003-02-10 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...
10942 TX 679256 ACE HEALTHCARE SERVICES INC 6666 HARWIN DRIVE SUITE 475 HOUSTON 77036 7139786600 Proprietary True True True True True True 2003-04-11 99 NaN 100 NaN 100 ...
11023 TX 679346 REGENCY HEALTH SERVICES 9898 BISSONNET SUITE 250 HOUSTON 77036 7138000300 Proprietary True True True True True True 2003-09-23 94 NaN 91 NaN 100 ...
11070 TX 679398 KINA HEALTHCARE SERVICES INC 6666 HARWIN DR SUITE 290 HOUSTON 77036 7137762551 Proprietary True True True True True True 2003-12-23 82 NaN 98 NaN 82 ...
11103 TX 679434 ST FRANCIS HEALTH CARE SERVICES INC 9888 BISSONNET SUITE #370 HOUSTON 77036 7132712200 Proprietary True True True True True False 2004-02-20 99 NaN 100 NaN 99 ...
11161 TX 679497 KINGWOOD HOME HEALTH 9898 BISSONNET ST SUITE 593 HOUSTON 77036 7136231062 Proprietary True True True True True False 2004-06-10 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...
11182 TX 679518 ASSURANCEJ HOMECARE SERVICES INC 9894 BISSONNET STREET # 585 HOUSTON 77036 7139882618 Proprietary True True True True True True 2006-07-21 96 NaN 100 NaN NaN ...
11191 TX 679528 WESLEY HOME HEALTH SERVICES INC 10333 HARWIN DRIVE SUITE 520 HOUSTON 77036 7137729900 Proprietary True True True True True False 2006-08-11 100 NaN 100 NaN 100 ...
11207 TX 679544 CHARSONY MEDICAL SERVICES 10039 BISSONNET SUITE #227 HOUSTON 77036 7134848890 Proprietary True True True True True False 2006-08-31 98 NaN 94 NaN 100 ...
11247 TX 679590 UNITY CARE HOME HEALTH INC 9894 BISSONNET SUITE #595 HOUSTON 77036 2819698545 Proprietary True True True True True True 2006-10-25 85 NaN 97 NaN 100 ...
11266 TX 679609 NCJ HEALTH SYSTEM 9888 BISSONNET SUITE 440 HOUSTON 77036 7137724858 Proprietary True True True True True False 2007-01-12 46 NaN 92 NaN NaN ...
11276 TX 679619 UNIVERSAL HEALTH CARE 10101 HARWIN DR SUITE 130 HOUSTON 77036 7134847100 Proprietary True True True True True True 2007-04-03 70 NaN 97 NaN 100 ...
11346 TX 679692 EMPIRE HOME HEALTH SERVICES 9888 BISSONNET ST SUITE 246 HOUSTON 77036 2812771414 Proprietary True True True False True False 2007-11-02 96 NaN 100 NaN 100 ...
11378 TX 679728 XTRA-CARE HOME HEALTH INC 9894 BISSONNET SUITE 100-S HOUSTON 77036 7132701160 Proprietary True True True True True True 2007-12-15 98 NaN 100 NaN 100 ...
11394 TX 679745 DYNAMIC CARE HEALTH SERVICES INC 7447 HARWIN DRIVE STE 101 HOUSTON 77036 7137733100 Proprietary True True True True True True 2007-10-15 99 NaN 91 NaN 100 ...
11397 TX 679748 VENTEX HOME HEALTH AGENCY INC 10333 HARWIN DR SUITE 373 HOUSTON 77036 7132727273 Proprietary True True True True True True 2008-01-31 70 NaN 100 NaN NaN ...
11445 TX 679797 SEFAN HEALTHCARE SERVICES INC 9894 BISSONNET #770 HOUSTON 77036 7135412588 Proprietary True True True True True True 2008-04-07 NaN This measure currently does not have data or h... NaN This measure currently does not have data or h... NaN ...
11573 TX 747009 SUMIC CARE INCORPORATED 7111 HARWIN DR #140 HOUSTON 77036 7139880013 Proprietary True True True False False True 2008-12-22 92 NaN 98 NaN 100 ...
11596 TX 747032 CANDID HEALTH CARE SERVICES INC 7447 HARWIN DRIVE SUITE 106 HOUSTON 77036 8325128549 Proprietary True True True True True True 2008-11-07 98 NaN 98 NaN 100 ...
11659 TX 747096 FRIENDS HEALTH SERVICES 9894 BISSONNET SUITE 320 HOUSTON 77036 7137797739 Proprietary True True False False True True 2008-11-11 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...
11684 TX 747122 TEXAS HEALTHSOURCE INC 9888 BISSONNET ST SUITE 530 HOUSTON 77036 7133045851 Proprietary True True True True True True 2009-04-02 97 NaN 94 NaN 98 ...
11719 TX 747159 BARATON HEALTH CARE SERVICES LLC 9896 BISSONNET SUITE 315 HOUSTON 77036 7134008080 Proprietary True True False False False False 2009-09-14 61 NaN 100 NaN 100 ...
11738 TX 747179 SONICA HEALTHCARE GROUP INC 10333 HARWIN DR SUITE #415 HOUSTON 77036 7137742790 Proprietary True True True False False True 2009-07-16 76 NaN 60 NaN 97 ...
11744 TX 747185 WONDER HOME CARE INCOPORATION 10101 HARWIN DRIVE SUITE 201 HOUSTON 77036 7137716666 Proprietary True True True True True True 2009-06-23 88 NaN 99 NaN 100 ...
11751 TX 747193 INLAND HOME HEALTH LLC 9894 BISSONNET SUITE 675 HOUSTON 77036 7137718838 Proprietary True True True True True False 2009-09-01 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...
11850 TX 747299 INCARNATION HOME HEALTH SERVICES INC 7457 HARWIN DRIVE SUITE 340 HOUSTON 77036 2814475152 Proprietary True True True True True True 2009-11-25 81 NaN 94 NaN 100 ...
11884 TX 747333 CEDER HEALTHCARE SERVICES INC 9894 BISSONNET STREET SUITE #678 HOUSTON 77036 7138007000 Proprietary True True True True True True 2010-02-22 55 NaN 91 NaN 98 ...
11914 TX 747364 MEDPSYCH 10333 HARWIN DR SUITE 322 HOUSTON 77036 2815780019 Proprietary True True True True True True 2010-03-18 47 NaN 73 NaN 97 ...
11936 TX 747389 PAIX HEALTH SERVICES, INC. 7457 HARWIN DR SUITE #155 HOUSTON 77036 7133391110 Proprietary True True True True True True 2010-05-28 96 NaN 99 NaN 100 ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

66 rows × 62 columns

Now, let's look around Houston. Why are there so many providers?

In [241]:
houston = data.query("City == 'HOUSTON' & State == 'TX'")
houston_pop = 2161000
len(houston), houston_pop / len(houston)

(525, 4116)

Counting the providers by street name:

In [227]:
houston['Street Name'] = [addr.split()[1] for addr in houston['Address']]
houston['Street Name'].value_counts().head()

HARWIN       34
SOUTH        26
dtype: int64

In [226]:
houston[houston['Street Name'] == 'SOUTHWEST']

State CMS Certification Number (CCN)* Provider Name Address City Zip Phone Type of Ownership Offers Nursing Care Services Offers Physical Therapy Services Offers Occupational Therapy Services Offers Speech Pathology Services Offers Medical Social Services Offers Home Health Aide Services Date Certified How often the home health team began their patients' care in a timely manner Footnote for how often the home health team began their patients' care in a timely manner How often the home health team taught patients (or their family caregivers) about their drugs Footnote for how often the home health team taught patients (or their family caregivers) about their drugs How often the home health team checked patients' risk of falling
8864 TX 453105 OAKWEST HEALTHCARE SERVICES INC 6776 SOUTHWEST FREEWAY SUITE 500 HOUSTON 77074 7137809500 Proprietary True True True True True True 2004-06-01 93 NaN 19 NaN 86 ...
8870 TX 453111 STAR TORCH HEALTH CARE INC 9647 SOUTHWEST FREEWAY HOUSTON 77074 7137745900 Proprietary True True True True True True 2004-06-23 90 NaN 98 NaN 100 ...
8917 TX 453165 ANOINTED HOME HEALTH CARE SERVICES INC 6776 SOUTHWEST FREEWAY SUITE 220 HOUSTON 77074 8322425907 Proprietary True True True True True False 2004-10-14 99 NaN 99 NaN 100 ...
9063 TX 457820 CLASS HOME HEALTH 4615 SOUTHWEST FREEWAY SUITE 478 HOUSTON 77027 7138880500 Proprietary True True True True True True 2005-04-26 95 NaN 98 NaN 99 ...
9118 TX 457886 PLATINUM CARE INC 4615 SOUTHWEST FREEWAY SUITE 818 HOUSTON 77027 7135521159 Proprietary True True True True True True 2005-09-13 91 NaN 79 NaN 100 ...
9131 TX 457900 CARDINAL HEALTH SERVICES INC 9100 SOUTHWEST FREEWAY SUITE 102 HOUSTON 77074 7137714050 Proprietary True True True True True False 2005-10-05 93 NaN 91 NaN 91 ...
9145 TX 457916 PROTEAM HEALTHCARE INC 7324 SOUTHWEST FREEWAY SUITE 370 HOUSTON 77074 7138388044 Proprietary True True True True True False 2005-11-14 94 NaN 99 NaN 99 ...
9180 TX 457967 HCC HOME CARE INC 4635 SOUTHWEST FREEWAY SUITE 515 HOUSTON 77027 7136683883 Proprietary True True True True True True 2005-12-22 66 NaN 94 NaN 99 ...
9295 TX 458461 LIBERTYCARE HOME AND COMMUNITY SUPPORT SERVICES 8303 SOUTHWEST FREEWAY SUITE 710 HOUSTON 77074 2813421974 Proprietary True True True True True True 1995-06-13 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...
9386 TX 459399 DYNAMIC HOME HEALTH SERVICES 8313 SOUTHWEST FWY SUITE 239 HOUSTON 77074 7132719010 Proprietary True True True True True True 1997-08-28 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...
9388 TX 459410 REACHOUT HOMECARE INC 8303 SOUTHWEST FREEWAY SUITE 280 HOUSTON 77074 7137769118 Proprietary True True True True True True 1998-06-17 96 NaN 86 NaN 93 ...
9427 TX 459479 PROMED HOME CARE 4615 SOUTHWEST FREEWAY SUITE 725 HOUSTON 77027 7136261644 Proprietary True True True True True True 1999-10-21 91 NaN 98 NaN 100 ...
10246 TX 673126 CJ HOME HEALTH SERVICES 6776 SOUTHWEST FREEWAY STE 580 HOUSTON 77074 7137842883 Proprietary True True True True True True 2005-02-18 85 NaN 96 NaN 100 ...
10249 TX 673129 NATIONAL HOME HEALTH SERVICES INC 8303 SOUTHWEST FREEWAY SUITE 547 HOUSTON 77074 7132709890 Proprietary True True True True True True 2005-02-10 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...
10262 TX 673143 SIGMAH HOME HEALTH SERVICES 7322 SOUTHWEST FREEWAY SUITE 1950 HOUSTON 77074 7137710606 Proprietary True True True True True True 2005-03-01 95 NaN 98 NaN 90 ...
10268 TX 673150 REDEMPTION HOME HEALTH SERVICES INC 8303 SOUTHWEST FREEWAY SUITE 702 HOUSTON 77074 7137715667 Proprietary True True True True True True 2005-03-02 95 NaN 90 NaN 100 ...
10308 TX 673193 MEGACARE HOME HEALTH SERVICES, INC. 8313 SOUTHWEST FREEWAY SUITE #217 HOUSTON 77074 7139950675 Proprietary True True True True True True 2005-04-07 75 NaN 95 NaN 97 ...
10551 TX 677906 PREFERRED HEALTH SERVICES INC 8323 SOUTHWEST FREEWAY SUITE 771 HOUSTON 77074 7137798288 Proprietary True True True True True True 2006-04-18 99 NaN 100 NaN 100 ...
10614 TX 677975 ST DAVID HOME HEALTH INC 7322 SOUTHWEST FREEWAY SUITE #490 HOUSTON 77074 7134145438 Proprietary True True True True True True 2006-06-28 97 NaN 100 NaN 100 ...
10650 TX 678086 COASTAL MEDICAL SERVICES INC 8303 SOUTHWEST FREEWAY SUITE 820 HOUSTON 77074 7137718470 Proprietary True True True True True True 1995-11-28 85 NaN 99 NaN 100 ...
10744 TX 679017 RELIABLE CARE HEALTH SERVICES 8323 SOUTHWEST FREEWAY SUITE 655 HOUSTON 77074 7137798861 Proprietary True True True True True True 2001-01-24 88 NaN 100 NaN 100 ...
10839 TX 679132 PINNACLE SENIOR CARE 7322 SOUTHWEST FREEWAY SUITE #170 HOUSTON 77074 7135321722 Proprietary True True True True True True 2002-05-21 87 NaN 95 NaN 92 ...
10917 TX 679223 PRESTIGE CARE HEALTH SERVICES INC 8313 SOUTHWEST FREEWAY SUITE #235 HOUSTON 77074 7132710105 Proprietary True True True True True True 2003-02-11 87 NaN 91 NaN 100 ...
10918 TX 679224 COMFORTHOME HEALTH CARE INC 8303 SOUTHWEST FREEWAY SUITE 770 HOUSTON 77074 7139882434 Proprietary True True True True True True 2003-02-13 100 NaN 97 NaN 99 ...
10993 TX 679315 HITECH MEDICAL SERVICES 9100 SOUTHWEST FREEWAY SUITE 212 HOUSTON 77074 7134574373 Proprietary True True True True True False 2003-08-05 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...
11010 TX 679332 NATIONS PIONEER HEALTH SERVICES INC 11224 SOUTHWEST FREEWAY SUITE 240 HOUSTON 77031 7135411987 Proprietary True True True True True True 2003-09-03 91 NaN 80 NaN 100 ...
11026 TX 679349 MGM VISION HEALTHCARE SERVICES INC 8303 SOUTHWEST FREEWAY #445 HOUSTON 77074 7137794560 Proprietary True True True True True True 2003-09-10 80 NaN 95 NaN 100 ...
11038 TX 679363 THE FOUR GROUP HOMECARE LLC 4615 SOUTHWEST FREEWAY SUITE # 400 HOUSTON 77027 7138401811 Proprietary True True True True True True 2003-09-26 67 NaN 81 NaN 100 ...
11058 TX 679385 EMMACO HOME HEALTH SERVICES INC 8303 SOUTHWEST FRWY SUITE 224 HOUSTON 77074 7137772376 Proprietary True True True True True True 2003-12-04 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...
11087 TX 679416 GENESIS HOME CARE 8323 SOUTHWEST FREEWAY SUITE 100 HOUSTON 77074 7139330427 Proprietary True True True True True True 2004-01-21 90 NaN 77 NaN 100 ...
11217 TX 679556 REHOBOTH HEALTHCARE SERVICES INC 8323 SOUTHWEST FREEWAY # 455 HOUSTON 77074 7132551070 Proprietary True True True True True True 2006-08-31 95 NaN NaN The number of patient episodes for this measur... NaN ...
11260 TX 679603 REHAB MED CARE 8313 SOUTHWEST FREEWAY SUITE 106 HOUSTON 77074 7134848132 Proprietary True True True False True True 2006-12-06 78 NaN 100 NaN 100 ...
11307 TX 679650 AMERICAN HEALTHCARE SERVICES 8323 SOUTHWEST FREEWAY SUITE 800 HOUSTON 77074 7139955884 Proprietary True True True True True True 2007-07-13 NaN This measure currently does not have data or h... NaN This measure currently does not have data or h... NaN ...
11316 TX 679659 OPT HOME HEALTHCARE INC 4615 SOUTHWEST FWY SUITE 477 HOUSTON 77027 7136220500 Proprietary True True True True True True 2007-08-03 83 NaN 98 NaN 98 ...
11317 TX 679660 HOME HEALTH PROFESSIONALS 4635 SOUTHWEST FREEWAY SUITE 540 HOUSTON 77027 7139420100 Proprietary True True True True True True 2007-09-01 97 NaN 99 NaN 99 ...
11362 TX 679710 PURITY HEALTH CARE INC 4615 SOUTHWEST FREEWAY STE 750 HOUSTON 77027 7132554360 Proprietary True True True True True True 2008-02-14 86 NaN 97 NaN 100 ...
11412 TX 679764 TTI HOME HEALTH CARE 4635 SOUTHWEST FREEWAY SUITE 182 HOUSTON 77027 7138500088 Proprietary True True True True True True 2008-06-05 83 NaN 90 NaN 99 ...
11433 TX 679785 M & M ADVANCED HEALTHCARE INC 4615 SOUTHWEST FREEWAY, SUITE 740 HOUSTON 77027 2818220150 Proprietary True True True True True False 2008-05-21 90 NaN 93 NaN 100 ...
11502 TX 743137 INSPIRATION HOME HEALTH 8303 SOUTHWEST FREEWAY # 700 HOUSTON 77074 7137770605 Proprietary True True True True True True 2008-06-17 100 NaN 30 NaN 100 ...
11572 TX 747008 SSA HOME HEALTH CARE 4635 SOUTHWEST FREEWAY SUITE #182 HOUSTON 77027 7139601188 Proprietary True True True True True True 2008-12-11 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...
11575 TX 747011 CAREPOINT HEALTH INC 7324 SOUTHWEST FRWY STE 540 HOUSTON 77074 7137717990 Proprietary True True False False False True 2008-10-24 94 NaN 97 NaN 100 ...
11595 TX 747031 ECCLESIASTES HOME HEALTHCARE INC 7322 SOUTHWEST FREEWAY SUITE 570 HOUSTON 77074 7137770196 Proprietary True True True True True True 2009-01-06 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...
11640 TX 747076 DELIGENT HEALTH SERVICES INC 8323 SOUTHWEST FREEWAY SUITE 233 HOUSTON 77074 7137772330 Proprietary True True True True True True 2009-01-16 96 NaN 99 NaN 100 ...
11665 TX 747102 LIFETOUCH HEALTH CARE SERVICES 8323 SOUTHWEST FREEWAY STE 505 HOUSTON 77074 7139887400 Proprietary True True True True True True 2009-01-21 85 NaN 94 NaN 99 ...
11729 TX 747170 ST CLARE HOME CARE, INC. 4635 SOUTHWEST FREEWAY, SUITE #303 HOUSTON 77027 7135724663 Proprietary True True True True False True 2009-04-16 91 NaN 96 NaN 97 ...
11895 TX 747345 CRESCENT HOME HEALTH INC 7322 SOUTHWEST FREEWAY SUITE #485 HOUSTON 77074 7134145837 Proprietary True True True True True True 2010-02-02 80 NaN 92 NaN 100 ...
11946 TX 747400 COUNTY HOME HEALTHCARE 7322 SOUTHWEST FREEWAY SUITE 660 HOUSTON 77074 7135414000 Proprietary True True True True True True 2010-07-26 96 NaN 100 NaN 100 ...
12026 TX 747484 A-1 ADVANTAGE HOME HEALTH SERVICES INC 4635 SOUTHWEST FWY SUITE 301 HOUSTON 77027 2819531500 Proprietary True True True True True True 2010-11-22 88 NaN 94 NaN 98 ...
12250 TX 747711 3 ALPINE HOME HEALTH 8303 SOUTHWEST FWY STE 338 HOUSTON 77074 2818859271 Proprietary True True True True True False 2012-06-28 83 NaN 100 NaN 98 ...
12263 TX 747724 JUSTICE HEALTHCARE GROUP INCORPORATED 7324 SOUTHWEST FREEWAY SUITE 660 HOUSTON 77074 7132712621 Proprietary True True True True True True 2012-05-07 NaN The number of patient episodes for this measur... NaN The number of patient episodes for this measur... NaN ...

50 rows × 62 columns

Who knows what's going on, but it's an interesting cluster.

After this analysis, I did some Googling, and guess what? The Attorney General and Department of Health and Human Services have been targeting Houston for Medicare fraud. For example, see:

In [ ]: